Floating-Point Arithmetic on Round-to-Nearest Representations

نویسندگان

Peter Kornerup

Jean-Michel Muller

Adrien Panhaleux

چکیده

Recently we introduced a class of number representations denoted RN-representations, allowing an un-biased rounding-to-nearest to take place by a simple truncation. In this paper we briefly review the binary fixed-point representation in an encoding which is essentially an ordinary 2’s complement representation with an appended round-bit. Not only is this rounding a constant time operation, so is also sign inversion, both of which are at best log-time operations on ordinary 2’s complement representations. Addition, multiplication and division is defined in such a way that rounding information can be carried along in a meaningful way, at minimal cost. Based on the fixed-point encoding we here define a floating point representation, and describe to some detail a possible implementation of a floating point arithmetic unit employing this representation, including also the directed roundings.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamical ‎C‎ontrol of Computations Using the Family of Optimal Two-point Methods to Solve Nonlinear ‎Equations

One of the considerable discussions for solving the nonlinear equations is to find the optimal iteration, and to use a proper termination criterion which is able to obtain a high accuracy for the numerical solution. In this paper, for a certain class of the family of optimal two-point methods, we propose a new scheme based on the stochastic arithmetic to find the optimal number of iterations in...

متن کامل

Formal Methods Applied to a Floating-Point Number System

This report presents a formalisation of the IEEE standard for binary floating-point arithmetic in the set-theoretic specification language Z. The formal specification is refined into four sequential components which unpack the operands, perform the arithmetic, pack and round the result. This refinement follows proven rules and so demonstrates a mathematically rigorous method of program developm...

متن کامل

A Constructive Criticism of the C/C++ Proposal for Complex Arithmetic

The IEEE 754 and 854 standards regulate the behaviour of real floating-point arithmetic, as implemented in most current hardand software systems. Although a myriad of libraries for complex floating-point arithmetic is available and in use, there is no general consensus on their implementation. The International C Standard describes in its Annex G guidelines for the implementation of complex ari...

متن کامل

Sharp ULP rounding error bound for the hypotenuse function

The hypotenuse function, z = √ x2 + y2, is sometimes included in math library packages. Assuming that it is being computed by a straightforward algorithm, in a binary floating point environment, with round to nearest rounding mode, a sharp roundoff error bound is derived, for arbitrary precision. For IEEE single precision, or higher, the bound implies that |z − z| < 1.222 ulp(z) and |z − z| < 1...

متن کامل

Error bounds on complex floating-point multiplication with an FMA

The accuracy analysis of complex floating-point multiplication done by Brent, Percival, and Zimmermann [Math. Comp., 76:1469–1481, 2007] is extended to the case where a fused multiply-add (FMA) operation is available. Considering floating-point arithmetic with rounding to nearest and unit roundoff u, we show that their bound √ 5u on the normwise relative error |ẑ/z − 1| of a complex product z c...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1201.3914 شماره

صفحات -

تاریخ انتشار 2011

Floating-Point Arithmetic on Round-to-Nearest Representations

نویسندگان

چکیده

منابع مشابه

Dynamical ‎C‎ontrol of Computations Using the Family of Optimal Two-point Methods to Solve Nonlinear ‎Equations

Formal Methods Applied to a Floating-Point Number System

A Constructive Criticism of the C/C++ Proposal for Complex Arithmetic

Sharp ULP rounding error bound for the hypotenuse function

Error bounds on complex floating-point multiplication with an FMA

عنوان ژورنال:

اشتراک گذاری